Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
1.
J Am Med Inform Assoc ; 30(6): 1022-1031, 2023 05 19.
Article in English | MEDLINE | ID: covidwho-2265425

ABSTRACT

OBJECTIVE: To develop a computable representation for medical evidence and to contribute a gold standard dataset of annotated randomized controlled trial (RCT) abstracts, along with a natural language processing (NLP) pipeline for transforming free-text RCT evidence in PubMed into the structured representation. MATERIALS AND METHODS: Our representation, EvidenceMap, consists of 3 levels of abstraction: Medical Evidence Entity, Proposition and Map, to represent the hierarchical structure of medical evidence composition. Randomly selected RCT abstracts were annotated following EvidenceMap based on the consensus of 2 independent annotators to train an NLP pipeline. Via a user study, we measured how the EvidenceMap improved evidence comprehension and analyzed its representative capacity by comparing the evidence annotation with EvidenceMap representation and without following any specific guidelines. RESULTS: Two corpora including 229 disease-agnostic and 80 COVID-19 RCT abstracts were annotated, yielding 12 725 entities and 1602 propositions. EvidenceMap saves users 51.9% of the time compared to reading raw-text abstracts. Most evidence elements identified during the freeform annotation were successfully represented by EvidenceMap, and users gave the enrollment, study design, and study Results sections mean 5-scale Likert ratings of 4.85, 4.70, and 4.20, respectively. The end-to-end evaluations of the pipeline show that the evidence proposition formulation achieves F1 scores of 0.84 and 0.86 in the adjusted random index score. CONCLUSIONS: EvidenceMap extends the participant, intervention, comparator, and outcome framework into 3 levels of abstraction for transforming free-text evidence from the clinical literature into a computable structure. It can be used as an interoperable format for better evidence retrieval and synthesis and an interpretable representation to efficiently comprehend RCT findings.


Subject(s)
COVID-19 , Comprehension , Humans , Natural Language Processing , PubMed
2.
J Am Med Inform Assoc ; 2022 Oct 18.
Article in English | MEDLINE | ID: covidwho-2236048

ABSTRACT

OBJECTIVE: To identify and characterize clinical subgroups of hospitalized COVID-19 patients. MATERIALS AND METHODS: Electronic health records of hospitalized COVID-19 patients at NewYork-Presbyterian/Columbia University Irving Medical Center were temporally sequenced and transformed into patient vector representations using Paragraph Vector models. K-means clustering was performed to identify subgroups. RESULTS: A diverse cohort of 11,313 patients with COVID-19 and hospitalizations between March 2, 2020 and December 1, 2021 were identified; median [IQR] age: 61.2 [40.3-74.3]; 51.5% female. Twenty subgroups of hospitalized COVID-19 patients, labeled by increasing severity, were characterized by their demographics, conditions, outcomes, and severity (mild-moderate/severe/critical). Subgroup temporal patterns were characterized by the durations in each subgroup, transitions between subgroups, and the complete paths throughout the course of hospitalization. DISCUSSION: Several subgroups had mild-moderate SARS-CoV-2 infections but were hospitalized for underlying conditions (pregnancy, cardiovascular disease (CVD), etc.). Subgroup 7 included solid organ transplant recipients who mostly developed mild-moderate or severe disease. Subgroup 9 had a history of type-2 diabetes, kidney and CVD, and suffered the highest rates of heart failure (45.2%) and end-stage renal disease (80.6%). Subgroup 13 was the oldest (median: 82.7 years) and had mixed severity but high mortality (33.3%). Subgroup 17 had critical disease and the highest mortality (64.6%), with age (median: 68.1 years) being the only notable risk factor. Subgroups 18-20 had critical disease with high complication rates and long hospitalizations (median: 40+ days). All subgroups are detailed in the full text. A chord diagram depicts the most common transitions, and paths with the highest prevalence, longest hospitalizations, lowest and highest mortalities are presented. Understanding these subgroups and their pathways may aid clinicians in their decisions for better management and earlier intervention for patients.

3.
Stud Health Technol Inform ; 290: 617-621, 2022 Jun 06.
Article in English | MEDLINE | ID: covidwho-1933568

ABSTRACT

Sample size is an important indicator of the power of randomized controlled trials (RCTs). In this paper, we designed a total sample size extractor using a combination of syntactic and machine learning methods, and evaluated it on 300 Covid-19 abstracts (Covid-Set) and 100 generic RCT abstracts (General-Set). To improve the performance, we applied transfer learning from a large public corpus of annotated abstracts. We achieved an average F1 score of 0.73 on the Covid-Set testing set, and 0.60 on the General-Set using exact matches. The F1 scores for loose matches on both datasets were over 0.74. Compared with the state-of-the-art tool, our extractor reports total sample sizes directly and improved F1 scores by at least 4% without transfer learning. We demonstrated that transfer learning improved the sample size extraction accuracy and minimized human labor on annotations.


Subject(s)
COVID-19 , COVID-19/epidemiology , Humans , Machine Learning , Natural Language Processing , Randomized Controlled Trials as Topic , Sample Size
4.
Stud Health Technol Inform ; 290: 309-313, 2022 Jun 06.
Article in English | MEDLINE | ID: covidwho-1933559

ABSTRACT

The rapid growth of clinical trials launched in recent years poses significant challenges for accurate and efficient trial search. Keyword-based clinical trial search engines require users to construct effective queries, which can be a difficult task given complex information needs. In this study, we present an interactive clinical trial search interface that retrieves trials similar to a target clinical trial. It enables user configuration of 13 clinical trial features and 4 metrics (Jaccard similarity, semantic-based similarity, temporal overlap and geographical distance) to measure pairwise trial similarities. Among 1,007 coronavirus disease 2019 (COVID-19) trials conducted in the United States, 91.9% were found to have similar trials with the similarity threshold being 0.85 and 43.8% were highly similar with the threshold 0.95. A simulation study using 3 groups of similar trials curated by COVID-19 clinical trial reviews demonstrates the precision and recall of the search interface.


Subject(s)
COVID-19 , Benchmarking , Data Collection , Humans , Search Engine , Semantics
5.
Stud Health Technol Inform ; 294: 392-396, 2022 May 25.
Article in English | MEDLINE | ID: covidwho-1865422

ABSTRACT

Anecdotally, 38.5% of clinical outcome descriptions in randomized controlled trial publications contain complex text. Existing terminologies are insufficient to standardize outcomes and their measures, temporal attributes, quantitative metrics, and other attributes. In this study, we analyzed the semantic patterns in the outcome text in a sample of COVID-19 trials and presented a data-driven method for modeling outcomes. We conclude that a data-driven knowledge representation can benefit natural language processing of outcome text from published clinical studies.


Subject(s)
COVID-19 , Humans , Natural Language Processing , Semantics
6.
JMIR Public Health Surveill ; 8(5): e35311, 2022 05 24.
Article in English | MEDLINE | ID: covidwho-1862504

ABSTRACT

BACKGROUND: COVID-19 messenger RNA (mRNA) vaccines have demonstrated efficacy and effectiveness in preventing symptomatic COVID-19, while being relatively safe in trial studies. However, vaccine breakthrough infections have been reported. OBJECTIVE: This study aims to identify risk factors associated with COVID-19 breakthrough infections among fully mRNA-vaccinated individuals. METHODS: We conducted a series of observational retrospective analyses using the electronic health records (EHRs) of the Columbia University Irving Medical Center/New York Presbyterian (CUIMC/NYP) up to September 21, 2021. New York City (NYC) adult residences with at least 1 polymerase chain reaction (PCR) record were included in this analysis. Poisson regression was performed to assess the association between the breakthrough infection rate in vaccinated individuals and multiple risk factors-including vaccine brand, demographics, and underlying conditions-while adjusting for calendar month, prior number of visits, and observational days in the EHR. RESULTS: The overall estimated breakthrough infection rate was 0.16 (95% CI 0.14-0.18). Individuals who were vaccinated with Pfizer/BNT162b2 (incidence rate ratio [IRR] against Moderna/mRNA-1273=1.66, 95% CI 1.17-2.35) were male (IRR against female=1.47, 95% CI 1.11-1.94) and had compromised immune systems (IRR=1.48, 95% CI 1.09-2.00) were at the highest risk for breakthrough infections. Among all underlying conditions, those with primary immunodeficiency, a history of organ transplant, an active tumor, use of immunosuppressant medications, or Alzheimer disease were at the highest risk. CONCLUSIONS: Although we found both mRNA vaccines were effective, Moderna/mRNA-1273 had a lower incidence rate of breakthrough infections. Immunocompromised and male individuals were among the highest risk groups experiencing breakthrough infections. Given the rapidly changing nature of the SARS-CoV-2 pandemic, continued monitoring and a generalizable analysis pipeline are warranted to inform quick updates on vaccine effectiveness in real time.


Subject(s)
2019-nCoV Vaccine mRNA-1273 , BNT162 Vaccine , COVID-19 , 2019-nCoV Vaccine mRNA-1273/administration & dosage , Adult , BNT162 Vaccine/administration & dosage , COVID-19/epidemiology , COVID-19/prevention & control , Female , Humans , Male , New York City/epidemiology , Retrospective Studies , Risk Factors
7.
J Am Med Inform Assoc ; 29(7): 1161-1171, 2022 06 14.
Article in English | MEDLINE | ID: covidwho-1795239

ABSTRACT

OBJECTIVE: To combine machine efficiency and human intelligence for converting complex clinical trial eligibility criteria text into cohort queries. MATERIALS AND METHODS: Criteria2Query (C2Q) 2.0 was developed to enable real-time user intervention for criteria selection and simplification, parsing error correction, and concept mapping. The accuracy, precision, recall, and F1 score of enhanced modules for negation scope detection, temporal and value normalization were evaluated using a previously curated gold standard, the annotated eligibility criteria of 1010 COVID-19 clinical trials. The usability and usefulness were evaluated by 10 research coordinators in a task-oriented usability evaluation using 5 Alzheimer's disease trials. Data were collected by user interaction logging, a demographic questionnaire, the Health Information Technology Usability Evaluation Scale (Health-ITUES), and a feature-specific questionnaire. RESULTS: The accuracies of negation scope detection, temporal and value normalization were 0.924, 0.916, and 0.966, respectively. C2Q 2.0 achieved a moderate usability score (3.84 out of 5) and a high learnability score (4.54 out of 5). On average, 9.9 modifications were made for a clinical study. Experienced researchers made more modifications than novice researchers. The most frequent modification was deletion (5.35 per study). Furthermore, the evaluators favored cohort queries resulting from modifications (score 4.1 out of 5) and the user engagement features (score 4.3 out of 5). DISCUSSION AND CONCLUSION: Features to engage domain experts and to overcome the limitations in automated machine output are shown to be useful and user-friendly. We concluded that human-computer collaboration is key to improving the adoption and user-friendliness of natural language processing.


Subject(s)
COVID-19 , Artificial Intelligence , Eligibility Determination/methods , Humans , Natural Language Processing , Patient Selection
8.
J Med Internet Res ; 23(9): e31122, 2021 09 30.
Article in English | MEDLINE | ID: covidwho-1459209

ABSTRACT

BACKGROUND: COVID-19 has threatened the health of tens of millions of people all over the world. Massive research efforts have been made in response to the COVID-19 pandemic. Utilization of clinical data can accelerate these research efforts to combat the pandemic since important characteristics of the patients are often found by examining the clinical data. Publicly accessible clinical data on COVID-19, however, remain limited despite the immediate need. OBJECTIVE: To provide shareable clinical data to catalyze COVID-19 research, we present Columbia Open Health Data for COVID-19 Research (COHD-COVID), a publicly accessible database providing clinical concept prevalence, clinical concept co-occurrence, and clinical symptom prevalence for hospitalized patients with COVID-19. COHD-COVID also provides data on hospitalized patients with influenza and general hospitalized patients as comparator cohorts. METHODS: The data used in COHD-COVID were obtained from NewYork-Presbyterian/Columbia University Irving Medical Center's electronic health records database. Condition, drug, and procedure concepts were obtained from the visits of identified patients from the cohorts. Rare concepts were excluded, and the true concept counts were perturbed using Poisson randomization to protect patient privacy. Concept prevalence, concept prevalence ratio, concept co-occurrence, and symptom prevalence were calculated using the obtained concepts. RESULTS: Concept prevalence and concept prevalence ratio analyses showed the clinical characteristics of the COVID-19 cohorts, confirming the well-known characteristics of COVID-19 (eg, acute lower respiratory tract infection and cough). The concepts related to the well-known characteristics of COVID-19 recorded high prevalence and high prevalence ratio in the COVID-19 cohort compared to the hospitalized influenza cohort and general hospitalized cohort. Concept co-occurrence analyses showed potential associations between specific concepts. In case of acute lower respiratory tract infection in the COVID-19 cohort, a high co-occurrence ratio was obtained with COVID-19-related concepts and commonly used drugs (eg, disease due to coronavirus and acetaminophen). Symptom prevalence analysis indicated symptom-level characteristics of the cohorts and confirmed that well-known symptoms of COVID-19 (eg, fever, cough, and dyspnea) showed higher prevalence than the hospitalized influenza cohort and the general hospitalized cohort. CONCLUSIONS: We present COHD-COVID, a publicly accessible database providing useful clinical data for hospitalized patients with COVID-19, hospitalized patients with influenza, and general hospitalized patients. We expect COHD-COVID to provide researchers and clinicians quantitative measures of COVID-19-related clinical features to better understand and combat the pandemic.


Subject(s)
COVID-19 , Influenza, Human , Databases, Factual , Humans , Influenza, Human/epidemiology , Pandemics , SARS-CoV-2
9.
Appl Clin Inform ; 12(4): 816-825, 2021 08.
Article in English | MEDLINE | ID: covidwho-1397950

ABSTRACT

BACKGROUND: Clinical trials are the gold standard for generating robust medical evidence, but clinical trial results often raise generalizability concerns, which can be attributed to the lack of population representativeness. The electronic health records (EHRs) data are useful for estimating the population representativeness of clinical trial study population. OBJECTIVES: This research aims to estimate the population representativeness of clinical trials systematically using EHR data during the early design stage. METHODS: We present an end-to-end analytical framework for transforming free-text clinical trial eligibility criteria into executable database queries conformant with the Observational Medical Outcomes Partnership Common Data Model and for systematically quantifying the population representativeness for each clinical trial. RESULTS: We calculated the population representativeness of 782 novel coronavirus disease 2019 (COVID-19) trials and 3,827 type 2 diabetes mellitus (T2DM) trials in the United States respectively using this framework. With the use of overly restrictive eligibility criteria, 85.7% of the COVID-19 trials and 30.1% of T2DM trials had poor population representativeness. CONCLUSION: This research demonstrates the potential of using the EHR data to assess the clinical trials population representativeness, providing data-driven metrics to inform the selection and optimization of eligibility criteria.


Subject(s)
COVID-19 , Diabetes Mellitus, Type 2 , Electronic Health Records , Humans , Patient Selection , SARS-CoV-2 , United States
10.
J Am Med Inform Assoc ; 28(11): 2461-2466, 2021 10 12.
Article in English | MEDLINE | ID: covidwho-1343694

ABSTRACT

Hundreds of interventional clinical trials have been launched in the United States to identify effective treatment strategies for combating the coronavirus disease 2019 (COVID-19) pandemic. However, to date, only a small fraction of these trials have completed enrollment, delaying the scientific investigation of COVID-19 and its treatment options. This study presents novel metrics to examine the geographic alignment between COVID-19 hotspots and interventional clinical trial sites and evaluate trial access over time during the evolving pandemic. Using temporal COVID-19 case data from USAFacts.org and trial data from ClinicalTrials.gov, U.S. counties were categorized based on their numbers of cases and trials. Our analysis suggests that alignment and access have worsened as the pandemic shifted over time. We recommend strategies and metrics to evaluate the alignment between cases and trials. Future studies are warranted to investigate the impact of the misalignment of cases and clinical trial sites on clinical trial recruitment.


Subject(s)
COVID-19 , Clinical Trials as Topic , Humans , Pandemics , SARS-CoV-2 , Treatment Outcome , United States
11.
J Am Med Inform Assoc ; 28(8): 1703-1711, 2021 07 30.
Article in English | MEDLINE | ID: covidwho-1217859

ABSTRACT

OBJECTIVE: We introduce Medical evidence Dependency (MD)-informed attention, a novel neuro-symbolic model for understanding free-text clinical trial publications with generalizability and interpretability. MATERIALS AND METHODS: We trained one head in the multi-head self-attention model to attend to the Medical evidence Ddependency (MD) and to pass linguistic and domain knowledge on to later layers (MD informed). This MD-informed attention model was integrated into BioBERT and tested on 2 public machine reading comprehension benchmarks for clinical trial publications: Evidence Inference 2.0 and PubMedQA. We also curated a small set of recently published articles reporting randomized controlled trials on COVID-19 (coronavirus disease 2019) following the Evidence Inference 2.0 guidelines to evaluate the model's robustness to unseen data. RESULTS: The integration of MD-informed attention head improves BioBERT substantially in both benchmark tasks-as large as an increase of +30% in the F1 score-and achieves the new state-of-the-art performance on the Evidence Inference 2.0. It achieves 84% and 82% in overall accuracy and F1 score, respectively, on the unseen COVID-19 data. CONCLUSIONS: MD-informed attention empowers neural reading comprehension models with interpretability and generalizability via reusable domain knowledge. Its compositionality can benefit any transformer-based architecture for machine reading comprehension of free-text medical evidence.


Subject(s)
Artificial Intelligence , Clinical Trials as Topic , Information Storage and Retrieval/methods , Models, Neurological , Natural Language Processing , COVID-19 , Computer Simulation , Data Mining , Humans , Software
12.
J Biomed Inform ; 118: 103790, 2021 06.
Article in English | MEDLINE | ID: covidwho-1196724

ABSTRACT

Clinical trials are essential for generating reliable medical evidence, but often suffer from expensive and delayed patient recruitment because the unstructured eligibility criteria description prevents automatic query generation for eligibility screening. In response to the COVID-19 pandemic, many trials have been created but their information is not computable. We included 700 COVID-19 trials available at the point of study and developed a semi-automatic approach to generate an annotated corpus for COVID-19 clinical trial eligibility criteria called COVIC. A hierarchical annotation schema based on the OMOP Common Data Model was developed to accommodate four levels of annotation granularity: i.e., study cohort, eligibility criteria, named entity and standard concept. In COVIC, 39 trials with more than one study cohorts were identified and labelled with an identifier for each cohort. 1,943 criteria for non-clinical characteristics such as "informed consent", "exclusivity of participation" were annotated. 9767 criteria were represented by 18,161 entities in 8 domains, 7,743 attributes of 7 attribute types and 16,443 relationships of 11 relationship types. 17,171 entities were mapped to standard medical concepts and 1,009 attributes were normalized into computable representations. COVIC can serve as a corpus indexed by semantic tags for COVID-19 trial search and analytics, and a benchmark for machine learning based criteria extraction.


Subject(s)
COVID-19 , Clinical Trials as Topic , Computer Simulation , Eligibility Determination , Humans , Machine Learning , Pandemics
13.
J Am Med Inform Assoc ; 28(1): 14-22, 2021 01 15.
Article in English | MEDLINE | ID: covidwho-1066364

ABSTRACT

OBJECTIVE: This research aims to evaluate the impact of eligibility criteria on recruitment and observable clinical outcomes of COVID-19 clinical trials using electronic health record (EHR) data. MATERIALS AND METHODS: On June 18, 2020, we identified frequently used eligibility criteria from all the interventional COVID-19 trials in ClinicalTrials.gov (n = 288), including age, pregnancy, oxygen saturation, alanine/aspartate aminotransferase, platelets, and estimated glomerular filtration rate. We applied the frequently used criteria to the EHR data of COVID-19 patients in Columbia University Irving Medical Center (CUIMC) (March 2020-June 2020) and evaluated their impact on patient accrual and the occurrence of a composite endpoint of mechanical ventilation, tracheostomy, and in-hospital death. RESULTS: There were 3251 patients diagnosed with COVID-19 from the CUIMC EHR included in the analysis. The median follow-up period was 10 days (interquartile range 4-28 days). The composite events occurred in 18.1% (n = 587) of the COVID-19 cohort during the follow-up. In a hypothetical trial with common eligibility criteria, 33.6% (690/2051) were eligible among patients with evaluable data and 22.2% (153/690) had the composite event. DISCUSSION: By adjusting the thresholds of common eligibility criteria based on the characteristics of COVID-19 patients, we could observe more composite events from fewer patients. CONCLUSIONS: This research demonstrated the potential of using the EHR data of COVID-19 patients to inform the selection of eligibility criteria and their thresholds, supporting data-driven optimization of participant selection towards improved statistical power of COVID-19 trials.


Subject(s)
COVID-19/therapy , Clinical Trials as Topic , Electronic Health Records , Eligibility Determination , Adolescent , Adult , Aged, 80 and over , COVID-19/mortality , Female , Hospital Mortality , Humans , Male , Middle Aged , Oxygen/blood , Patient Selection , Pregnancy , Research Design , Respiration, Artificial , SARS-CoV-2 , Tracheostomy , Treatment Outcome , Young Adult
14.
J Am Med Inform Assoc ; 28(3): 616-621, 2021 03 01.
Article in English | MEDLINE | ID: covidwho-936404

ABSTRACT

Clinical trials are the gold standard for generating reliable medical evidence. The biggest bottleneck in clinical trials is recruitment. To facilitate recruitment, tools for patient search of relevant clinical trials have been developed, but users often suffer from information overload. With nearly 700 coronavirus disease 2019 (COVID-19) trials conducted in the United States as of August 2020, it is imperative to enable rapid recruitment to these studies. The COVID-19 Trial Finder was designed to facilitate patient-centered search of COVID-19 trials, first by location and radius distance from trial sites, and then by brief, dynamically generated medical questions to allow users to prescreen their eligibility for nearby COVID-19 trials with minimum human computer interaction. A simulation study using 20 publicly available patient case reports demonstrates its precision and effectiveness.


Subject(s)
COVID-19 , Clinical Trials as Topic , Abstracting and Indexing , Adult , Aged , Aged, 80 and over , Child, Preschool , Eligibility Determination , Female , Humans , Information Storage and Retrieval , Male , Middle Aged , Patient Selection
SELECTION OF CITATIONS
SEARCH DETAIL